Paxos is a consensus algorithm. We all are well aware of the notion of consensus from a common situation—a group of people deciding on a food place for lunch. In the field of computer science, Paxos helps multiple processes agree on a single value from multiple proposed values. It has various applications in distributed systems where multiple nodes work together and need to make a common decision for various purposes.

The goal of the Paxos algorithm is to eventually decide on one value.

svg viewer

Working of Paxos#

When a group of people is deciding on a place to have lunch, there are usually three types of people involved. Some are knowledgeable about restaurants and will suggest places, some will agree with the suggestions of others, and some will wait for the final decision without providing any input. In Paxos, these roles are given the names proposers, acceptors, and learners.

Roles#

Let's define these roles below:

Proposers: Proposers send their proposals to the other replica group members. Each proposer tries to have the majority of members to support their proposal. Clients hand over commands to one of the proposers, which initiates the Paxos algorithm to get the client's value accepted.

Acceptors: Acceptors are the group members who receive proposals and accept or reject them.

Learners: Learners keep track of accepted values. When the same value has been accepted at a majority, the consensus is achieved and these commands are then marked as chosen. The local state machine can execute that specific command.

Note: We will assume that each replica plays all three roles of proposer, acceptor, and learner. However, it is possible to give specific replicas specific roles, having all three components everywhere simplifies our discussion and actual implementation as well.

Point to ponder

Question

Why can’t we have just one acceptor?

Hide Answer

If a single acceptor fails after accepting a value, no one in the system is aware of previous consensus until the failed node comes back online.

This is why the Paxos algorithm uses a quorum of majority acceptors. For the 2t+12t + 1 acceptor group, Paxos can tolerate the failure of tt acceptors. Doing so makes Paxos fault-tolerant and increases its availability. The nodes that had failed also catch up with the majority decision when they come back. The Paxos algorithm is designed in such a way that once a majority has agreed on a value, the minority can not change that decision. It is one of the main safety conditions of Paxos.

Let's see how proposers reach a consensus on a single value.

Protocol steps#

The Paxol protocol consists of two steps. The first step is for the proposers to gain majority votes from the acceptors on a proposal number without proposing any value. In the second step, the proposer that receives majority votes proceeds to propose the value and requests the majority of acceptors (the ones that voted for it) to accept that value. If the majority of acceptors accept the value, then that value is chosen.

The two phases of Paxos protocol
The two phases of Paxos protocol

Point to ponder

Question

Why are two phases necessary in the basic Paxos algorithm? Why can’t we have just one phase?

Hide Answer

Let’s see what could go wrong if we had only one phase of accepting a value.

Split vote: If each acceptor accepts only the first value proposed by any proposer, we can have a situation where no value is chosen because no one has the majority. In the following example, we have five servers, and three is a majority to get a value accepted. However, due to a split vote scenario, none of the three proposers have a majority, so no value is chosen.

Conflicting choices: If we let acceptors accept any value, they get in trouble as well. If we could accept values in one phase, it would also be possible to encounter a situation where the accepted value is changed. Consider the example below. Server 3 accepts the blue value after accepting the red value. As a result, the chosen value changes from red to blue.

Our next example shows that having two phases in itself is not sufficient to enforce safety. In the following example, both red and blue proposers see no other accepted values and move to their second phases. But the red accepted messages get delayed, and the blue value ends up being chosen. However, later on, server 3 changes its decision when red messages come in, resulting in a change in the chosen value.

In summary, we learned with examples that having two phases is necessary, but this rule is not sufficient by itself (we need more accompanying rules).

We have the following three observations about the Paxos protocol due to the above question:

  1. Two phases are necessary to enforce safety properties.

  2. Once a value has been chosen (meaning accepted at a majority), all future proposals for that slot must propose/choose the same chosen value.

  3. Proposals must be ordered such that new proposals take precedence and any old proposals are rejected.

Proposal numbers#

Proposal numbers play a vital role in the Paxos algorithm. They act as logical clocks to order different proposals. To totally order proposals, we tag the server identifiers with the proposal as lower order bits. For example, the first proposal by server 5 will be 1.5 (we use dotted notation in the text for readability only).

svg viewer

The higher order bits of the proposal contain the round number. A round number is a monotonically increasing integer that can start from 1 and as per the interaction with the acceptors, will bump the round numbers either serially (such as 2, 3, etc.) or by using a round number higher than what was received by some acceptor (for example, 5).

Point to ponder

Question

Why is the server ID placed at the lower order bits and not the higher order bits of the proposal number?

Hide Answer

Whatever we put in the higher order bits have greater precedence in our choices. If we put round numbers in the higher order bits, they play a dominant role, else the server identifier with a higher number plays a bigger role.

However, putting the server identifier in the lower order bits seems better because we often have 3,5, or 7 replicas. This means we need a fixed number of bits for the servers (for example, 3 bits for 7 servers). On the other hand, the proposal numbers can increase unbounded for dueling proposers (that we will see later). So the round number part of the proposal needs more flexibility in terms of the number of bits.

Step 1: Handshake phase (prepare-promise messages)#

The protocol starts with a prepare request sent by each proposer to all acceptors to support their proposal. The acceptors, in response, send a promise message or reject the request.

Request#

Proposers choose unique proposal numbers using an incremental unique ID generator function and send a prepare(proposal_number) request to all acceptors.

'''
Request by a proposer to a set of acceptors containing majority acceptors
'''

proposal_number = proposer.generateIncUniqNum()
proposer.send_prepare(proposal_number)
def send_prepare(proposal_number):
    '''
    Broadcasts a Prepare message to all Acceptors
    '''

Response#

An acceptor, on receiving a prepare request, responds to the proposer based on the following conditions:

  • Condition 1: If the acceptor hasn’t yet promised to any other prepare request, in that case, it will record the proposal number to its persistent storage and send a promise message to the proposer in response to the prepare request. The promise message serves as a guarantee for the proposer provided by the acceptor that it wouldn’t promise to prepare requests with smaller proposal numbers in the future.
def recv_prepare(proposer_ID, proposal_number):
    '''
    Called when a Prepare message is received from a Proposer
    '''
    if acceptor.promised_proposal == None:
        acceptor.promised_proposal = proposal_number
        acceptor.send_promise(proposer_ID, proposal_number, None, None)

def send_promise(proposer_ID, proposal_number, accepted_proposal_number, accepted_value):
        '''
        Sends a Promise message to the specified Proposer
        '''
  • Condition 2: If the acceptor has already promised to a prepare request but with a lower proposal number, it will make a promise to the current prepare request with the higher proposal number, and will break the promise with the already promised request. (And if the acceptor has accepted any proposal, it will also include in its response the accepted proposal number and the corresponding accepted value as discussed in step 2.)
def recv_prepare(proposer_ID, proposal_number):
    '''
    Called when a Prepare message is received from a Proposer
    '''
    if acceptor.promised_proposal != None && acceptor.promised_proposal < proposal_number && acceptor.accepted_proposal_number == None:
        acceptor.promised_proposal = proposal_number
        acceptor.send_promise(proposer_ID, proposal_number, acceptor.accepted_proposal_number, acceptor.accepted_value)

def send_promise(proposer_ID, proposal_number, accepted_proposal_number, accepted_value):
        '''
        Sends a Promise message to the specified Proposer
        '''

  • Condition 3: If the acceptor has already promised to a prepare request with the higher-numbered proposal, it will simply reject this request and respond to the proposer with the already promised proposal number so that the proposer can try again with the higher-numbered proposal.
def recv_prepare(proposer_ID, proposal_number):
    '''
    Called when a Prepare message is received from a Proposer
    '''
    if acceptor.promised_proposal != None && acceptor.promised_proposal > proposal_number:
        acceptor.reject(proposer_ID, proposal_number, acceptor.promised_proposal)

def reject(proposer_ID, proposal_number, promised_proposal_number):
        '''
        Sends an error message to the specified Proposer
        '''
        print("Try with a proposal number higher than " + promised_proposal_number "!")

  • Condition 4: If the acceptor has already accepted an accept request with a lower proposal number, it will make a promise to the current prepare request with the higher proposal number and send in response the accepted proposal number and the corresponding value along with the promise message. This allows the current proposer to propose the same value that is already in motion.
def recv_prepare(proposer_ID, proposal_number):
    '''
    Called when a Prepare message is received from a Proposer
    '''
    if acceptor.accepted_proposal_number != None && acceptor.promised_proposal < proposal_number:
        acceptor.promised_proposal = proposal_number
        acceptor.send_promise(proposer_ID, proposal_number, acceptor.accepted_proposal_number, acceptor.accepted_value)

def send_promise(proposer_ID, proposal_number, accepted_proposal_number, accepted_value):
        '''
        Sends a Promise message to the specified Proposer
        '''

Step 2: Value acceptance phase (accept-accepted messages)#

The second step of the protocol is for the proposers to propose a value and request the acceptors to accept that value. And if the proposer is able to receive acceptance from the majority of acceptors, that value is chosen. However, in case of rejection, proposals need to restart phase one.

Request#

If the proposer receives a promise from the majority of the acceptors, it will send an accept request, accept(proposal_number, value), to those acceptors. The accept request includes value in addition to the proposal_number. The proposed value is selected based on the following conditions:

  • Condition 1: If the proposer didn’t receive any accepted proposals along with the promise message in response to the prepare requests, then it will propose a value of its choice.
  • Condition 2: If the proposer received already accepted proposals by the acceptors, along with the promise messages in response to the prepare requests, then it has to select the value with the highest proposal_number from the received accepted proposals and propose the selected value.
'''
Request by a proposer to a set of acceptors containing majority acceptors
'''

if proposer.noOf_promises_received >= majority:
    if proposer.received_promises contain accepted_values:
        proposer.send_accept(proposal_number, accepted_value against max(received_accepted_proposal_numbers)
    else:
        proposer.send_accept(proposal_number, value_of_proposers_choice)
def send_accept(proposal_number, value):
    '''
    Broadcasts an accept message to all acceptors that promised on proposal_number.
    '''

Response#

An acceptor, on receiving an accept request, responds to the proposer based on the following conditions:

  • Condition 1: If the latest proposal number promised by the acceptor is the same as in the accept request, it will accept the request and respond to the proposer with an accepted message. If the proposer receives majority acceptance of the proposed value, that value is chosen and broadcast to all the group members (learners).

    Note: The acceptors also broadcast the accepted message to a set of learners, ensuring that if one member (the proposer that receives majority acceptance) fails, the consensus state is not lost.

def recv_accept(proposer_ID, proposal_number, value):
    '''
    Called when an Accept! message is received from a Proposer
    '''
    if proposal_number == acceptor.promised_proposal:
        acceptor.accepted_proposal_number = proposal_number
        acceptor.accepted_value = value
        acceptor.send_accepted(proposer_ID, proposal_number, acceptor.accepted_value)
        
def send_accepted(proposer_ID, proposal_number, accepted_value):
    '''
    Broadcasts an Accepted message to the specified Proposer and to all Learners
    '''

  • Condition 2: If the acceptor has promised to a new proposal with a higher proposal number, it will reject the accept request. The proposer should start afresh using a proposal number higher than the one received in the reject message.
def recv_accept(proposer_ID, proposal_number, value):
    '''
    Called when an Accept! message is received from a Proposer
    '''
    if proposal_number < acceptor.promised_proposal:
        acceptor.accepted_proposal_number = proposal_number
        acceptor.accepted_value = value
        acceptor.reject(proposer_ID, proposal_number, )
        
def reject(proposer_ID, proposal_number, promised_proposal_number):
    '''
    Sends an error message to the specified Proposer
    '''
    print("Try with a proposal number higher than " + promised_proposal_number "!")

Let’s summarize the basic Paxos protocol in the following illustration:

Normal operation of basic Paxos
Normal operation of basic Paxos

Points to ponder

Question 3

How are old proposals rejected when new ones have arrived?

Hide Answer

The new proposals will eventually use a larger proposal number and any delayed proposals with smaller proposal numbers will be rejected.

3 of 3

In the next lesson, we will see basic Paxos in action using various examples.

Introduction to Paxos

Basic Paxos in Action